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Abstract 

Confidence sets based on sparse estimators are shown to be large com- 
pared to more standard confidence sets, demonstrating that sparsity of 
an estimator comes at a substantial price in terms of the quality of the 
estimator. The results are set in a general parametric or semiparametric 
framework. 
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1 Introduction 

Sparse estimators have received increased attention in the statistics literature 
in recent years. An estimator for a parameter vector is called sparse if it es- 
timates the zero components of the true parameter vector by zero with prob- 
ability approaching one as sample size increases without bound. Examples of 
sparse estimator are (i) post-model-selection estimators following a consistent 
model selection procedure, (ii) thresholding estimators with a suitable choice of 
the thresholds, and (iii) many penalized maximum likelihood estimators (e.g., 
SCAD, LASSO, and variants thereof) when the regularization parameter is cho- 
sen in a suitable way. Many (but not all) of these sparse estimators also have 
the property that the asymptotic distribution of the estimator coincides with 
the asymptotic distribution of the (infeasible) estimator that uses the zero re- 
strictions in the true parameter; see, e.g., Potscher (1991, Lemma 1), Fan and Li 
(2001). This property has ~ in the context of SCAD estimation - been dubbed 
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the "oracle" property by Fan and Li (2001) and has received considerable atten- 
tion in the literature, witnessed by a series of papers establishing the "oracle" 
property for a variety of estimators (e.g., Bunea (2004), Bunea and McKeague 
(2005), Fan and Li (2002, 2004), Zou (2006), Li and Liang (2007), Wang and 
Leng (2007), Wang, G. Li, and Tsai (2007), Wang, R. Li, and Tsai (2007), 
Zhang and Lu (2007), Zou and Yuan (2008)). 

The sparsity property and the closely related "oracle" property seem to 
intimate that an estimator enjoying these properties is superior to classical es- 
timators like the maximum likelihood estimator (not possessing the "oracle" 
property). We show, however, that the sparsity property of an estimator does 
not translate into good properties of confidence sets based on this estimator. 
Rather we show in Section 2 that any confidence set based on a sparse estima- 
tor is necessarily large relative to more standard confidence sets, e.g., obtained 
from the maximum likelihood estimator, that have the same guaranteed cov- 
erage probability. Hence, there is a substantial price to be paid for sparsity, 
which is not revealed by the pointwise asymptotic analysis underlying the "or- 
acle" property. Special cases of the general results provided in Section 2 have 
been observed in the literature: It has been noted that the "naive" confidence 
interval centered at Hodges' estimator has infimal coverage probability that con- 
verges to zero as sample size goes to infinity, see Kale (1985), Beran (1992), and 
Kabaila (1995). [By the "naive" confidence interval we mean the interval one 
would construct in the usual way from the pointwise asymptotic distribution of 
Hodges' estimator.] Similar results for "naive" confidence intervals centered at 
post-model-selection estimators that are derived from certain consistent model 
selection procedures can be found in Kabaila (1995) and Leeb and Potscher 
(2005). We note that these "naive" confidence intervals have coverage prob- 
abilities that converge to the nominal level pointwise in the parameter space, 
but these confidence intervals are - in view of the results just mentioned - not 
"honest" in the sense that the infimum over the parameter space of the cover- 
age probabilities converges to a level that is below the nominal level. Properties 
of confidence sets based on not necessarily sparsely tuned post-model-selection 
estimators are discussed in Kabaila (1995, 1998), Potscher (1995), Leeb and 
Potscher (2005), Kabaila and Leeb (2006). 

The results discussed in the preceding paragraph show, in particular, that the 
"oracle" property is problematic as it gives a much too optimistic impression of 
the actual properties of an estimator. This problematic nature of the "oracle" 
property is also discussed in Leeb and Potscher (2008) from a risk point of 
view; cf. also Yang (2005). The problematic nature of the "oracle" property 
is connected to the fact that the finite-sample distributions of these estimators 
converge to their limits pointwise in the parameter space but not uniformly. 
Hence, the limits often do not reveal the actual properties of the finite-sample 
distributions. An asymptotic analysis using a "moving parameter" asymptotics 
is possible and captures much of the actual behavior of the estimators, sec Leeb 
and Potscher (2005), Potscher and Leeb (2007), and Potscher and Schneider 
(2009). These results lead to a view of these estimators that is less favorable 
then what is suggested by the "oracle" property. 



2 



The remainder of the paper is organized as follows: In Section 2 we provide 
the main results showing that confidence sets based on sparse estimators are 
necessarily large. These results are extended to "partially" sparse estimators 
in Section 2.1. In Section 3 we consider a thresholding estimator as a simple 
example of a sparse estimator, construct a confidence set based on this estimator, 
and discuss its properties. 

2 On the size of confidence sets based on sparse 
estimators 

Suppose we are given a sequence of statistical experiments 

{P„,(, : e M^-} n = l,2,... (1) 

where the probability measures Pn,e live on suitable measure spaces X^,). 
[Often Pn^9 will arise as the distribution of a random vector {y[, . . . , y!^)' where 
yi takes values in a Euclidean space. In this case Xn will be an n-fold product 
of that Euclidean space and X„ will be the associated Borel cr-field; also n will 
then denote sample size.] We assume further that for every 7 G R*^ the sequence 
of probability measures 

{Pn,'y/s/^ : n = 1, 2, . . .} 

is contiguous w.r.t. the sequence 

{F„,o : n = 1,2,...}. 

This is a quite weak assumption satisfied by many statistical experiments (in- 
cluding experiments with dependent data); for example, it is certainly satisfied 
whenever the experiment is locally asymptotically normal. The above assump- 
tion that the parameter space is M'^ is made only for simplicity of presentation 
and is by no means essential, see Remark [T] 

Let 9n denote a sequence of estimators, i.e., 9n is a measurable function 
on Xn taking values in M.^. We say that the estimator 9n (more precisely, the 
sequence of estimators) is sparse if for every 9 € SJ' and i = 1, . . . , k 

lim Pnfi [9n,i = 0) = 1 holds whenever 9i ~ 0. (2) 

Here 9n^i and 9i denote the i-th component of 0„ and of 9, respectively. That 
is, the estimator is guaranteed to find the zero components of 9 with probability 
approaching one as n 00. [The focus on zero- values in the coordinates of 9 is 
of course arbitrary. Furthermore, note that Condition ^ is of course satisfied 
for nonsensical estimators like 9n = 0. The sparse estimators mentioned in 
Section 1 and Remark [T] below, however, are more sensible as they are typically 
also consistent for 9.] 
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Remark 1 Typical examples of sparse estimators are as follows: consider a 
linear regression model Y — X9 + u under standard assumptions (for simplicity 
assume u ~ N{0,a^ln) with ct > known and X nonstochastic with X'X/n —> 
Q, a positive definite matrix). Suppose a subset of the regressors contained in 
X is selected first by an application of a consistent all-subset model selection 
procedure (such as, e.g., Schwarz' minimum BIC-method) and then the least 
squares estimator based on the selected model is reported, with the coefficients of 
the excluded regressor variables being estimated as zero. The resulting estimator 
for 9 is a so-called post-model-selection estimator and clearly has the sparsity 
property. Another estimator possessing the sparsity property can be obtained via 
hard-thresholding as follows: compute the least squares estimator from the full 
model Y = X9 + u and replace those components of the least squares estimator 
by zero which have a t-statistic that is less than a threshold rj^ in absolute value. 
The resulting estimator has the sparsity property ?/ 77^ — > and n^^'^rj^ —> 00 
holds for n 00. As mentioned in the introduction, also a large class of 
penalized least squares estimators has the sparsity property, see the references 
given there. 

Returning to the general discussion, we are interested in confidence sets for 
9 based on Let C„ be a random set in R'^ in the sense that C„ = Cn(uj) is 
a subset of R'^ for every uj E Xn with the property that for every 6* S R*^ 

{w e : 6* e C„(w)} 

is measurable, i.e., belongs to Xn- We say that the random set Cn is based on 
the estimator 9n if C„ satisfies 

PnM [k e Cn) = 1 (3) 

for every 6* G R*^. [If the set inside of the probability in ([3]) is not measurable, 
the probability is to be replaced by inner probability.] For example, if C„ is a 
fc-dimensional interval (box) of the form 

(4) 

where a„ and 6„ are random vectors in R'^' with only nonnegative coordinates, 
then condition ^ is trivially satisfied. Here we use the notation [c, d] = [ci , di] x 
• • • X [cfc, dk] for vectors c = (ci, . . . , Ck)' and d = {di, . . . , dk)' . We also use the 
following notation: For a subset A of R*"', let 

diam(74) = sup{||a; — y\\ : x E A,y E A} 

denote the diameter of A (measured w.r.t. the usual Euclidean norm ||-||); 
furthermore, if e is an arbitrary element of R*^ of length 1, and a G A let 

ext(A, a, e) = sup{A > : Ae + a G A}. 



; 9n ^" bji 
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That is, ext{A, a, e) measures how far the set A extends from the point a into 
the direction given by e. [Observe that without further conditions (such as, e.g., 
convexity of A) not all points of the form Xe + a with A < ext{A, a, e) need to 
belong to A.] 

The following result shows that confidence sets based on a sparse estimator 
are necessarily large. 

Theorem 2 Suppose the statistical experiment given in flj) satisfies the above 
contiguity assumption. Let 0„ he a sparse estimator sequence and let C„ he a 
sequence of random sets based on the estimator On in the sense of (0). Assume 
that C„ is a confidence set for with asymptotic infimal coverage probability 5, 
i.e., 

5 = liminf inf P^fi {0 S C„) . 



n — *oo 



Then for every t > and every e G M of length 1 we have 

liminf sup Pn e (\/next{Cn,0n,e) >t]>6. (5) 

In particular, we have for every t > 

liminf sup P„ e (VHdiam(C,i) > t) > 6. (6) 

[If the set inside of the probability in or 0) is not measurable, the prohahility 
is to he replaced hy inner probability.] 

Proof. Since obviously diam(C„) > ext(C„, e) holds with P„ a -probability 1 
for all 6 in view of it suffices to prove (O. Now, for every sequence 6'„ e M'' 
we have in view of ([3]) 

5 - liminf inf P„,e {0 e C„) < liminf P„,e„ (dn e C„) (7) 

n — >OQ O^M.^ n — >oo 



= liminf |p„,e,. (On e C„, e„ e C„, K ^ 

+P«,9„ (e„eC„,0„^o)}. 

Sparsity implies 

hm P„,o [On ^ O) = 0, 
and hence for 6'„ — ^ j \/n the contiguity assumption implies 

limsupP„,e„ iOn G C„,^„ ^ O) < lim P„,e„ ^ o) = 0. 

Consequently, we obtain from ([7]) for 0„ = j/\/n with 7 7^ 

S < lim inf P„,e„ e C„ , K e C„ , 6,, = o] 



< liminf P„,e„ (V^ext(C„,^„, 7/ hll) > hll) (8) 
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because of the obvious inclusion 

{On e Cn,en G Cn,k - o} C |ext (C„ , ^„ , ^n/ II || ) > ||^?„||} • 

Since 7 was arbitrary, the result ([5]) follows from ^ upon identifying t and ||7||. 
■ 

Corollary 3 Suppose the assumptions of Theorem [H are satisfied and Cn is a 
confidence 'interval' of the form Then for every i = I,. . .,k and every 

t > 

liminf sup P„ g i\/nan i >t] > S 

and 

liminf sup P„ e (Vnbn i> t) > S 

hold, where a„.i and bn^i denote the i-th coordinate of On and bn, respectively. 
In particular, if a„ and bn are nonrandom, 

liminf ^/nan,i = liminf \/nbn,i = 00 
holds for every i = I, . . . , k, provided that S > 0. 

Proof. Follows immediately from the previous theorem upon observing that 
(dJ implies ext(C„, — Ci) = a„^i and ext(C„, e^) = bn^i where denotes 
the i-th standard basis vector. ■ 

It is instructive to compare with standard confidence sets. For example, in a 
normal linear regression model ^/n times the diameter of the standard confidence 
ellipsoid is stochastically bounded uniformly in 9. In contrast, Theorem [2] tells 
us that any confidence set C„ based on sparse estimators with •y/ndiam(C„) 
being stochastically bounded uniformly in necessarily has infimal coverage 
probability equal to zero. 

Remark 4 (Nuisance parameters) Suppose that the sequence of statistical ex- 
periments is of the form {Pn,e,T ■ & € R'^, t G where 9 is the parameter of in- 
terest and T is now a (possibly infinite dimensional) nuisance parameter. Theo- 
rem\^can then clearly be applied to the parametric subfamilies {Pn,6i,r '■ 9 € R^} 
for T £ T (provided the conditions of the theorem are satisfied). In particular, 
the following is then an immediate consequence: suppose that the contiguity con- 
dition and sparsity condition are satisfied for every t €T. Suppose further that 
we are again interested in confidence sets for 9 based on 9n ( in the sense that 

Pnfi.T i^n G Cri^ = 1 for all 9 G M'"',r G that have asymptotic infimal (over 
9 and t) coverage probability 5. Then results analogous to 0) and but with 
the supremum extending now over R*^ x T, hold. 

Remark 5 ( Confidence sets for linear functions of 9) Suppose that a statisti- 
cal experiment : 9 G R'^} satisfying the aforementioned contiguity property 
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and a sparse estimator On are given hut that we are interested in setting a confi- 
dence set for d = AO that is based on dn = AOn, where A is a given qxk matrix. 
Without loss of generality assume that A has full row rank. [In particular, this 
covers the case where we have a sparse estimator for 0, but are interested in 
confidence sets for a subvector only.] Suppose Cn is a confidence set for ■& that 

is based on 'dn (in the sense that Pn^e (^^n S Cii^ = 1 for all G R'^) and that 
has asymptotic infimal coverage probability 5. Then essentially the same proof 
as for Theorem [D shows that for every t > Q and every e e R'' of length 1 we 
have 

liminf sup P„ g ( -^/n ext(C„, i?„, e) >t\>5 (9) 
em'- ' ^ ' 

and consequently also the analogue of holds. 

Remark 6 The contiguity assumption together with the sparsity of the estima- 
tor was used in the proof of Theorem\^to imply lim„_>oo Pnfin {^n 7^ 0^ = for 

all sequences of the form On — l/ 7 G M*"'. For some important classes of 
sparse estimators this relation can even be established for all sequences of the 
form On = l/vn, 7 G M'^, where u„ are certain sequences that diverge to infinity, 
but at a rate slower than ^Jn (cf. Leeb and Potscher (2005), Potscher and Leeb 
(2007, Proposition 1), Potscher and Schneider (2009, Proposition 1)). Inspec- 
tion of the proof of Theorem\^ shows that then a stronger result follows, namely 
that ^ and (0) hold even with ^Jn replaced by w„. This shows that in such 
a case confidence sets based on sparse estimators are even larger than what is 
predicted by Theorem\^ This simple extension immediately applies mutatis mu- 
tandis also to the other results in the paper (with the exception of Theorem ] 10[ 
an extension of which would require a separate analysis). The example discussed 
in Section 3 nicely illustrates the phenomenon just described. 

Remark 7 The assumption that the parameter space indexing the statistical 
experiment, say Q, is an entire Euclidean space is not essential as can be seen 
from the proofs. The results equally well hold if, e.g., Q is a subset of Euclidean 
space that contains a ball with center at zero (simply put On ~ ^1 \fn if this 
belongs to Q, and set 0„ = otherwise). In fact, Q could even be allowed to 
depend on n and to "shrink" to zero at a rate slower than 71^^/^. [In that sense 
the results are of a "local" rather than of a "global" nature.] 

Remark 8 Suppose the contiguity assumption is satisfied and the estimator 
sequence On is sparse. Then the uniform convergence rate of On is necessarily 
slower than n^^l"^ . In fact, more is true: for every real number M > we have 

liminf sup Pn e [n^'^ On~0 > m) ^ 1. (10) 
eeR'' ' ^ ' 

To see this, set On = ll^P^i with \\"f\\ > M and observe that the left-hand side 
in the above display is not less than 

liminf P„,e„ (n^/^ On - On > m) = liminf P„.(,„ (n^^'' \\0n\\ >M,On = o)= 1, 
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the displayed equalities holding true in view of sparsity and contiguity. [If 

limn— ,00 Pn,e„ {^n 7^ 0^ = holds for all sequences of the form 6n — 

7 g M*^, where Vn > is a given sequence (cf. Remark\B\), then obviously klOjl 
holds with n^l"^ replaced by w„. Furthermore, the results in this remark continue 
to hold if the supremum over 9 G M'^' is replaced by a supremum over a set Q 
that contains a ball with center at zero.] 

2.1 Confidence sets based on partially sparse estimators 

Suppose that in the framework of ([l]) the parameter vector 9 is partitioned as 
9 = (a', /3'y where a is (fc — kp) x 1 and /3 is fc^ x 1 (0 < A;/3 < fc). Furthermore, 

suppose that the estimator 9n = (ci^,/3„)' is 'partially' sparse in the sense that 
it finds the zeros in /3 with probability approaching 1 (but not necessarily the 
zeros in a). That is, for every 6 E M.'' and i ~ 1, . . . ,kp 

hm Pn e (Pn , = 0) = 1 holds whenever /3. = 0. (11) 

E.g., 9n could be a post-model-selection estimator based on a consistent model 
selection procedure that only subjects the elements in f3 to selection, the ele- 
ments in a being 'protected'. 

If we are now interested in a confidence set for /3 that is based on we 
can immediately apply the results obtained sofar: By viewing a as a 'nuisance' 
parameter, we can use Remark |4] to conclude that Theorem [2] applies mutatis 
mutandis to this situation. Moreover, combining the reasoning in Remarks [4] 
and[5l we can then immediately obtain a result similar to ([9|) for confidence sets 
for Ap that are based on A being an arbitrary matrix of full row rank. 

For the sake of brevity we do not spell out the details which are easily obtained 
from the outline just given. 

The above results, however, do not cover the case where one is interested in 
a confidence set for 9 based on a partially sparse estimator 9n, or more generally 
the case of confidence sets for A9 based on A9n, where the linear function A9 
is also allowed to depend on a. For this case we have the following result. 

Theorem 9 Suppose the statistical experiment given in (QJ) is such that for 
some a G M*''"*''*' the sequence Pn,{a' ,-y'/^)' ^■s contiguous w.r.t. Pn,(a',o)' for 
every 7 G R'^'' . Let 9n be an estimator sequence that is partially sparse in the 
sense of {Zip. Let A be a q x k matrix of full row rank, which is partitioned 
conformably with 9 as A = {Ai, A2), and that satisfies rank Ai < q. Let C„ be a 

sequence of random sets based on A9n (in the sense that P„ 9 (^A9n € C„^ = 1 

for all 9 G R'^j. Assume that Cn is a confidence set for A9 with asymptotic 
infimal coverage probability 5, i.e., 

5 = hminf inf P„ e iA9 G C„) ■ 
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Then for every t > we have 

liminf sup Pn,e (\/ndiam(C,i) > t) > S. (12) 

[If the set inside of the probability in is not measurable, the probability is 
to be replaced by inner probability.] 

Proof. Consider sequences 6'„ = (a', 7' /^/^)' £ R*-' where a is as in the theo- 
rem. Then similar as in the proof of Theorem [2] exploiting partial sparsity and 
contiguity we arrive at 

S < liminfP„,e„ e C„) 

n — '■00 

< liminf P„,e„ f^^n e C„, e C„,^„ = o) 

< liminf P„,e„ (diam(C„) > M((a-a„)',77xAI)'||) . (13) 

By the assumption on A there exists a vector 7q such that A2JQ is non-zero 
and is linearly independent of the range space of Ai. Consequently, IIA2JQ ^ 0, 
where 11 denotes the orthogonal projection on the orthogonal complement of 
the range space of Ai. Set 7 = cjq for arbitrary c. Then 

\\A{{a-any,j'/Vny\f ^ \\Ai{a ~ an) + A2j/Vnf 

> n-ic^ ||n^27of- 

Combined with p3|) . this gives 

6 < liminf P„.e„ (^^:diam(C„) > |c| ||nA27o||) • 

n — >oo 

Since |jnA27o|| > by construction and since c was arbitrary, the result (|12p 
follows upon identifying t and |c| ||nA27Q|j. ■ 

Some simple generalizations are possible: Inspection of the proof shows that 
S may be replaced by S{a) = liminf„__,oo inf/jgRfc Pn,e [AO € C„) where a is as 
in the theorem and 9 — (a',/3')'. Furthermore, the partial sparsity condition 
(fTTj) only needs to hold for aU = (a', /3')' with a as in the theorem. A similar 
remark applies to Theorem 1101 given below. 

The condition on A in the above theorem is, for example, satisfied when 
considering confidence sets for the entire vector 9 as this corresponds to the case 
A = Ik (and q = k). [The condition is also satisfied in case A = (Ofc^x(fe-fe^)j ^fe^) 
which corresponds to setting confidence sets for /3. However, in this case already 
the extension of Theorem [2] discussed prior to Theorem [9] applies.] 

Theorem [9] does not cover the case where a confidence set is desired for a 
only (i.e., A = (/fe-fe^,0(fc_fc^)xfc^))- In fact, without further assumptions on 
the estimator 0„ no result of the above sort is in general possible in this case 
(to see this consider the case where d„ and /?„ are independent and is a 
well-behaved estimator). However, under additional assumptions, results that 
show that confidence sets for a are also necessarily large will be obtained next. 
We first present the result and subsequently discuss the assumptions. 
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Theorem 10 Suppose the statistical experiment given in |ip is such that for 
1^ the sequence Pn,{a' ,Y /^/n)' contiguous w.r.t. Pn,(a',o)' for 



some a € 



hk—k 



every 7 € M . Let On be an estimator sequence that is partially sparse in the 
sense of ill]) . Suppose that there exists a (k — kp) x kp-matrix D such that for 
every 7 the random vector n^/^(a„ — a) converges in P„ ^, ^^y- distribution 
to Z + Dj where Z is a {k ~ kp) x 1 random vector with a distribution that is 
independent of ^. Let A be a q x k matrix of full row rank, which is partitioned 
conformably with 9 as A — {Ai, A2), and assume that A1D—A2 ^ 0. Let Cn be a 

sequence of random sets based on A9n (in the sense that Pn.e {^AOn e — 1 

for all 9 e M'^'j. Assume that Cn is a confidence set for A9 with asymptotic 
infimal coverage probability S, i.e., 

5 = liminf inf P„ g {A9 G C„) . 



Then for every t > we have 

liminf sup Pn,e (\/ndiam(C„) > t^ > 6. 



(14) 



[If the set inside of the probability in is not measurable, the probability is 
to be replaced by inner probability.] 

Proof. Consider sequences 6'„ — (a', j' /^/n)' G M'"' where a is as in the theorem. 
Then for every t > we have 



S < hm inf Pn,e,. {A9n G C„) = hm inf P„^(,„ A9n G C„, A9n G C, 



< liminf (A9n G Cn,A9n G C„,«'/' 

n — *oo \ 

+ limsupP„,e„ (ri^^^ A{dn ~ 9n) <t 

< liminf F„,g„ (n^/^ diam(C„) > t 



A{9n - 9n) 



> t 



hmsupP„.e^ [n 



,1/2 



A{en - On) 



< t 



(15) 
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Exploiting partial sparsity and contiguity we get 



limsupP„,e„ (n^^^ A{K - 0^) < t) 

< lim sup P„,e„ (K = 0, n^^^ -On) < t) 
+ limsupP„,e„ (/3„ 7^ O] 

= limsupP„,e„ (/?„ = 0, 71^/2 AiOn - 0,,) < t) 

< limsupPn,^^ f n^/^ — a) — v427/\/n|| < tj 
= limsupP„,e„ (||X„ + {AiD - ^2)7!! < t) 

n — >oo 

< limsupP„,e„ {\\Xn\\ > \\{AiD - A2h\\ - t) 



(16) 



where Xn converges to AiZ in P„. 9^ -distribution. Since AiD — A2 7^ by 
assumption, we can find a 7 such that IKAiZ? — ^2)7!! — t is arbitrarily large, 
making the far right-hand side of arbitrarily small. This, together with 
pS)) . establishes the result. ■ 

Note that the case where a confidence set for a is sought, that is, A = 



kf3 1 0(fc-fcff)xfcff 



) , which was not covered by Theorem [9l is covered by Theo- 



rem [10] except in the special case where D — 0. 

The weak convergence assumption in the above theorem merits some discus- 
sion: Suppose 6n is a post-model-selection estimator based on a model selection 
procedure that consistently finds the zeroes in /3 and then computes 0„ as the 
restricted maximum likelihood estimator 9n{R) under the zero-restrictions in 
(3. Under the usual regularity conditions, the restricted maximum likelihood 
estimator q;„(P) for a will then satisfy that n^/^(a„(P) — a) converges to a 
N{D^, E)-distribution under the sequence of local alternatives 6'„ = (a', 7'/ Vn)'- 

Since lim„^oo Pn,e„ {jin ~ ^ — 1 by partial sparsity and contiguity, the esti- 
mators din and q;„(P) coincide with P„. 9^ -probability approaching one. This 
shows that the assumption on q;„ will typically be satisfied for such post-model- 
selection estimators with Z ^ N{0, E). [For a precise statement of such a result 
in a simple example see Leeb and Potscher (2005, Proposition A. 2).] While we 
expect that this assumption on the asymptotic behavior of d„ is also shared by 
many other partially sparse estimators, this remains to be verified on a case by 
case basis. 
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3 An Example: A confidence set based on a 
hard-thresholding estimator 

Suppose the data yi, . . . ,?/„ are independent identically distributed as N(9, 1), 
EM.. Let the hard-thresholding estimator 9n be given by 

K = yl(|y| > Vn) 

where the threshold rj^^ is a positive real number and y denotes the maximum 
likelihood estimator, i.e., the arithmetic mean of the data. Of course, 9n is 
nothing else than a post-model-selection estimator following a t-type test of the 
hypothesis 9 — versus the alternative 0^0. It is well-known and easy to 
see that 9n satisfies the sparsity condition if ri,^ and n^/^ri,^ oo (i.e., the 
underlying model selection procedure is consistent); in this case then n^/^{9n—9) 
converges to a standard normal distribution if 7^ 0, whereas it converges to 
pointmass at zero ii 9 = Q. Note that 9n - with such a choice of the threshold rj^ 
- is an instance of Hodges' estimator. In contrast, if 77„ — > and n^^^r],^ — > e, 
< e < 00, the estimator 9n is a post- model-selection estimator based on 
a conservative model selection procedure. See Potscher and Leeb (2007) for 
further discussion and references. 

In the consistent model selection case the estimator possesses the "ora- 
cle" property suggesting as a confidence interval the "naive" interval given by 
^ if ^„ = and by C^™" = [k - z^i_sy2,k + ^(1-5)72] otherwise, 
where 5 is the nominal coverage level and /2 is the 1 — (1 — J) /2-quantile of 
the standard normal distribution. This interval satisfies Pn.e{9 e (jnaive\^ _^ ^ 
for every ^, but - as discussed in the introduction and as follows from the re- 
sults in Section 2 - it is not honest and, in fact, has infimal coverage probability 
converging to zero. A related, but infeasible, construction is to consider the 
intervals C* — [9n — Cn{9), On + c„(6')] where c„(6') is chosen as small as possible 
subject to Pn,e{0 G C*) — S for every 9. [Note that (7^"^™^ can be viewed as be- 
ing obtained from C* by replacing c„(0) by the limits Coo(^) for n — > 00, where 
Coo{9) = if 6 — and Coo{9) — 2;(i-5)/2 if 6* 7^ 0, and then by replacing 9 by 9n 
in Coo{0)-] An obvious idea to obtain a feasible and honest interval is now to use 
c„ = maxggR Cn{9) as the half-length of the interval, i.e. C„ — [9n^ Cm9n+Cn]. 
From Theorem [2] we know that ^/ncn — *■ ck) in the case where 77^ and 
''^^^^Vn ~^ 00 (and if (5 > 0), but it is instructive to study the behavior of C„ in 
more detail. 

We therefore consider now confidence intervals C„ for 6 of the form C„ = 
[9n — an,9n + bn] with nonnegative constants a„ and 6„ (thus removing the 
symmetry restriction on the interval). Note that the subsequent result is a 
finite-sample result and hence does not involve any assumptions on the behavior 

of Vn- 

Proposition 11 For 1 has an 
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infimal coverage probability satisfying 

infP„,,(^eCJ 

$(ni/2(a„ - r/„)) - $(-nV26„) if r]„<an + b„ anda„<b„ 
$(ni/2o„) - $(nV2(-6„ + r?„)) if r?„ < a„ + 6„ anda„>b„ , 

if Vn>(ln + bn 

where $ denotes the standard normal cumulative distribution function. 

Proof. Elementary calculations and the fact that ri}/'^{y — 9) is standard nor- 
mally distributed give for the coverage probability Pn{d) = Pn,e {9 £ Cn) 

Pn{6) = Pnfi{-v}'^bn<n^'\K-6)<n^l^a„) 

= Pr (-n^'%n <Z< n^/^an, \z + n^/H\ > n^'^r,^) 



+ Pr 



(-fen 



< 



< ar, 



Z + n^'^9 



»7n) 



where Z is a standard normally distributed random variable and Pr denotes 
a generic probability. Simple, albeit tedious computations give the coverage 
probability as follows. If > o„ + 6„ 



Pn{0) 



^{n^/'^an) - ^{-n^/'^bn) if 9<-an-r]^ or 9 > bn + Vn 

<^{n'/^{-9-r]J)-^{-n'/%n) if - a„ - r?„ < ^ < 6„ - r?„ 

a b„ - T]„ < 9 < -an or bn < 6 < -a„ + rj^ . 

$(ni/2(-0 + r7„))-<l>(ni/2(_0_^J) if - a„ < < 6„ 

[ $(nVV)-$(nV2(-^ + ^J) if -an + r,n<e<bn + Vn 



Hence, the infimal coverage probability in this case is obviously zero. Next, if 
{an + bn)/2 <'nn<an + bn then 



Pn{0) 



$(ni/2a„) - $(-ni/2&„) if 61 < -a„ - rjn or 9>bn + Vn 

$(ni/2(-^^-r?„))-$(-ni/2&„) if - a„ - 77„ < < -a„ 

$(r^l/2(_^^ + ^^))_$(_„l/2^,„) if _an<9<bn-Vn 

$(„l/2(_0 + ^J) _ ^n^/^-e - ryj) it bn-Vn<0< -an + Vn 



[ $(ni/2a„)-$(nV2(_^ + ^J) 



and if ??„ < (a„ + 6„)/2 



$(„i/2(_0_^j)_<(>(-nV2;,j 

$(„l/2(_^ + ^J)_$(_„l/25^) 

$(ni/2a„)_$(„i/2(_6»_^^)) 
[ $(ni/2a„)-$(ni/2(-0 + r;J) 



if - a„ + 77„ < ^ < 6„ 

if bn<9<bn + Vn 



if 9 < -On -Vn or &>K + 11n 

or - an + Vn < S < bn - Vn 

if - an-r]n<9 < -an 

if -an<9 < -On + Vn 

if &„ - ??„ < 6' < &„ 
if bn < 9 < bn + r]n 
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Inspection shows that in both cases the function does not have a minimum, but 
the infimum equals the smaller of the left-hand side limit a„— ) and the 
right-hand side limit Pn{bn+), which shows that the infimum of p„(0) equals 
min[$(ni/2(a„ - r,J) - $(-ni/2&„), _ $(ni/2(_6„ + ^J)]. . 

As a point of interest we note that the coverage probability Pn{0) has exactly 
two discontinuity points (jumps), one at 9 = —an and one sd, 9 = bn, except in 
the trivial = where the two discontinuity points merge into one. 

An immediate consequence of the above proposition is that n^/^ diam(C„) = 
n^/^(a„ -I- bn) is not less than n^/^ry^, provided the infimal coverage probability 
is positive. Hence, in case that ?7„ ^ and ri^/^ry„ oo, i.e., in case that 
9n is sparse, we see that n^/^diam(C„) — > oo, which of course just confirms 
the general result obtained in Theorem [3 above. [In fact, this result is a bit 
stronger as only the infimal coverage probabilities need to be positive, and not 
their limes inferior.] 

If the interval is symmetric, i.e., a„ — bn holds, and a„ > 77„/2 is satisfied, 
the infimal coverage probability becomes $(n^/^a„) — $(n^/^(— a„ +?7„)). Since 
this expression is zero if a„ =77^/2, and is strictly increasing to one as a„ goes to 
infinity, any prescribed infimal coverage probability less than one is attainable. 
Suppose < (5 < 1 is given. Then the (shortest) confidence interval C„ of the 
form [9n — a„, 0„ + a„] with infimal coverage probability equal to S has to satisfy 
On > and 

$(ni/2a„) _ ^n'/^i-an + r/„)) = S. 

If now rjj^ and n^^^rjn — > 00, i.e., if 0„ is sparse, it follows that n^/^a„ — > 00 
and 

n'/^{-an+Vn)^^-'il-S) 
or in other words that a„ > ?7„/2 has to satisfy 

an = Tin - n-^/^^-\l -5) + o(n-i/2). (17) 

Conversely, any a„ > 77„/2 satisfying (|17p generates a confidence interval with 
asymptotic infimal coverage probability equal to 5. We observe that (jl7p shows 
that K;„diam(C„) = 2K„a„ — > 00 for any sequence that satisfies k„7]„ — > 00, 
which includes sequences that are o(n^/^) by the assumptions on r/„. Hence, 
this result is stronger than what is obtained from applying Theorem [2] (or its 
Corollary) to this example, and illustrates the discussion in Remark [51 
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